Statistical Approaches to Patent Translation for PatentMT - Experiments with Various Settings of Training Data
نویسندگان
چکیده
This paper describes our experiments and results in the NTCIR-9 Chinese-to-English Patent Translation Task. A series of open source software were integrated to build a statistical machine translation model for the task. Various Chinese segmentation, additional resources, and training corpus preprocessing were then tried based on this model. As a result, more than 20 experiments were conducted to compare the translation performance. Our current results show that 1) consistent segmentation between the training and testing data is important to maintain the performance; 2) sufficient number of good quality bilingual training sentences is more helpful than additional bilingual dictionaries; and 3) the translation effectiveness in BLEU values doubles as the number of bilingual training sentences at the level of 100,000 doubles.
منابع مشابه
Machine Translation System for Patent Documents Combining Rule-based Translation and Statistical Postediting Applied to the NTCIR-10 PatentMT Task
In this article, we describe system architecture, preparation of training data and discussion on experimental results of the EIWA group in the NTCIR-10 Patent Translation Task. Our system is combining rule-based machine translation and statistical postediting. The thing about our new system compared with NTCIR-9 PatentMT task is to implement automatic selecting method from multiple translations...
متن کاملThe NiuTrans Machine Translation System for NTCIR-9 PatentMT
This paper describes the NiuTrans system developed by the Natural Language Processing Lab at Northeastern University for the NTCIR-9 Patent Machine Translation task (NTCIR-9 PatentMT). We present our submissions to the two tracks of NTCIR-9 PatentMT, and show several improvements to our phrase-based Statistical MT engine, including: a hybrid reordering model, large-scale language modeling, and ...
متن کاملUQAM's System Description for the NTCIR-10 Japanese and English PatentMT Evaluation Tasks
This paper describes the development of a Japanese-English and English-Japanese translation system for the NTCIR-10 Patent MT tasks. The MT system is based on the provided training data and Moses decoder. We report our first attempt on statistical machine translation for these pairs of languages and the Patent domain.
متن کاملBBN's Systems for the Chinese-English Sub-task of the NTCIR-9 PatentMT Evaluation
This paper describes the work we conducted for building a statistical machine translation (SMT) system for the ChineseEnglish sub-task of the NTCIR-9 patent machine translation (MT) evaluation [17]. We first applied the various techniques on patent data that we had developed for improving SMT performance on other types of data. Our results show that most of the techniques work on patent documen...
متن کاملSystem Description of BJTU-NLP SMT for NTCIR-9 PatentMT
This paper presents the overview of statistical machine translation systems that BJTU-NLP developed for the NTCIR-9 Patent Machine Translation Task (NTCIR-9 PatentMT). We compared the performance between phrase-based translation model and factored translation model in our Patent SMT of Chinese to English and English to Japanese. Factored translation model was proposed as an extended phrase-base...
متن کامل